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Abstract:- Many wireless communication systems such as IS54, enhanced data rates for the GSM evolution 
(EDGE), worldwide interoperability for microwave access (WiMAX) and long term evolution (LTE) have 
adopted low-density parity-check (LDPC), tail-biting convolutional, and turbo codes as the forward error 
correcting codes (FEC) scheme for data and overhead channels. Therefore, many efficient algorithms have been 
proposed for decoding these codes. However, the different decoding approaches for these two families of codes 
usually lead to different ardware architectures. Since these codes work side by side in these new wireless systems, 
it is a good idea to introduce a universal decoder to handle these two families of codes. The present work exploits 
the parity-check matrix (H) representation of tailbiting convolutional and turbo codes, thus enabling decoding via 
a unified belief propagation (BP) algorithm. Indeed, the BP algorithm provides a highly effective general 
methodology for devising low-complexity iterative decoding algorithms for all convolutional code classes as well 
as turbo codes. While a small performance loss is observed when decoding turbo codes with BP instead of MAP, 
this is offset by the lower complexity of the BP algorithm and the inherent advantage of a unified decoding 
architecture. 

I. INTRODUCTION 

Until recently, most known decoding algorithms for convolutional codes were based on either algebraically 
calculating the error pattern or on trellis graphical representations such as in the MAP and Viterbi algorithms. With the 
advent of turbo coding [1], a third decoding principle has appeared: iterative decoding. Iterative decoding was also 
introduced in Tanner's pioneering work [2], which is a general framework based on bipartite graphs for the description of 
LDPC codes and their decoding via the belief propagation (BP) algorithm. 

In many respects, convolutional codes are similar to block codes. For example, if we truncate the trellis by which a 
convolutional code is represented, a block code whose codewords correspond to all trellis paths to the truncation depth is 
created. However, this truncation causes a problem in error performance, since the last bits lack error protection. The 
conventional solution to this problem is to encode a fixed number of message blocks L followed by m additional all-zero 
blocks, where m is the constraint length of the convolutional code [4]. This method provides uniform error protection for all 
information digits, but causes a rate reduction for the block code as compared to the convolutional code by the multiplicative 
factor L/(L + m). In the tail-biting convolutional code, zero-tail bits are not needed and replaced by payload bits resulting in 
no rate loss due to the tails. Therefore, the spectral efficiency of the channel code is improved. Due to the advantages of the 
tail-biting method over the zero-tail, it has been adopted as the FEC in addition to the turbo code for data and overhead 
channels in many wireless communications systems such as IS -54, EDGE, WiMAX and LTE [5, 6, 7]. 

Both turbo and LDPC codes have been extensively studied for more than fifteen years. However, the formal 
relationship between these two classes of codes remained unclear until Mackay in [8] claimed that turbo codes are LDPC 
codes. Also, Wiberg in [9] marked another attempt to relate these two classes of codes together by developing a unified 
factor graph representation for these two families of codes. In [10], McEliece showed that their decoding algorithms fall into 
the same category as BP on the Bayesian belief network. Finally, Colavolpe [11] was able to demonstrate the use of the BP 
algorithm to decode convolutional and turbo codes. The operation in [11] is limited to specific classes of convolutional 
codes, such as convolutional self orthogonal codes (CSOCs). Also, the turbo codes therein are based on the serial structure 
while the parallel structure is more prevalent in practical applications. 

In LTE and WiMAX systems, the proposed decoders for the tail-biting convolutional codes and turbo codes are 
based on the Viterbi and MAP algorithms, respectively. However, many other efficient algorithms have been proposed to 
decode tail-biting convolutional codes as well as turbo codes. For example, in [3], the reduced complexity wrap-around 
Viterbi algorithm was proposed to decode tail-biting convolutional codes in the WiMAX system to reduce the average 
number of decoding iterations and memory usage. In addition, other decoding algorithms such as double traceback and 
bidirectional Viterbi algorithms were also proposed for tail-biting convolutional codes in LTE [4]. Finally in [5], the design 
and optimization of low-complexity high performance rate-matching algorithms based on circular buffers for LTE turbo 
codes was investigated. 

In this paper, we focus on the direct application of the BP algorithm used for LDPC codes to decode the tail-biting 
convolutional codes and turbo codes in WiMAX and LTE systems, respectively. Based on that, we propose a decoder with 
drastically lower implementation complexity than that proposed in the latest releases for these systems [5-7]. The rest of this 
paper is organized as follows. In Section II and III, the graphical representation of the tail-biting convolutional and turbo 
codes with the necessary notations and definitions used throughout this paper are introduced, followed by an investigation of 
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the coding structures in WiMAX and LTE systems in Section IV. In Section V, simulation results for the performance of 
tail-biting convolutional and turbo codes using the proposed algorithm are introduced followed in Section VI by a 
complexity comparison between the proposed algorithm and the traditional ones. Finally, the paper is concluded in Section 

II. CONVOLUTIONAL CODES 

First introduced by Elias in 1955 [15], binary convolutional codes are one of the most popular forms of binary 
error correcting codes that have found numerous applications [16]. A convolutional code is called tail-biting when its 
codewords are those code sequences associated with paths in the trellis that start from a state equal to the last m bits of an 
information vector of k data bits. Many efficient algorithms have been proposed for decoding tail-biting convolutional codes 
such as the Viterbi and MAP algorithms. As shown below, we represent the tail-biting convolutional code by its generator 
and parity-check matrices in order to apply the BP algorithm directly. 

A. Parity-check matrix of tail-biting convolutional codes 

To be able to represent a tail-biting convolutional code by a Tanner graph and then apply the BP algorithm to its 
decoding, a prerequisite is to obtain its generator matrix G and its paritycheck matrix H. We introduce the matrix 
representation by a simplified example as follows: 

Example 1: Consider the convolutional code with rate R = lu — 1/2, where k represents the number of input bits and n the 
output bits. Assume that the information sequence is x = (x Q ,x 1 ,x 2 , ...). The encoder will convert this to the sequences 
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Note that if there are multiple input streams, we can refer to a single interleaved input x = (x^ , x\ , ... ) Also, the 

output streams are multiplexed to create a single coded data stream y= (y , yj , y^ y r , ...) where y is the convolutional 
codeword. In addition, each element in the interleaved output stream y is a linear combination of the elements in the input 



stream x = (xi ,x l 
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An impulse response g^' is obtained from the encoder output by applying a single 1 at the input followed by a 
string of zeros, then strings of zeros are applied to all the other inputs (in the case of multiple inputs). The impulse responses 
for the encoder in our example are 
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The impulse responses are often referred to as generator sequences, because their relationship to the codewords 
generated by the corresponding convolutional encoder is similar to that between generator polynomials and codewords in a 
cyclic code. The generator sequences can be expressed in the following general form: 
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Each coded output sequence y^ - 1 in a rate Vn code is the convolution of the input sequence x and the impulse response 

(2) 
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In vector form, this is expressed 



which can be developed thus 
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We can express these forms as a matrix multiplication operation, thus providing a generator matrix similar to that 
developed for block codes. In fact, the primary difference arises from the fact that the input sequence is not necessarily 
bounded in length, and thus the generator and parity check matrices for convolutional codes are semi infinite. However, 
herein we introduce the G and H matrices as equivalent to a tail-biting convolutional code having finite length. Therefore, 
the generator and parity-check matrices will be as follows [4]: 
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where 
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Note that each block of k rows in the G matrix is a circular shift by n positions of the previous such block. In 
general, the parity-check matrix of a rate k/n tail-biting convolutional code with constraint length m is 
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where I is the k x k identity matrix, is the k x k all zero matrix, and Pi, i = 0, 1, ..., m, is a k x(n -k) matrix whose 



entries are 
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Here, g p ri is equal to 1 or coresponding to whether or not the I th stage of the shift register for the input 
contributes to output j(i = 0, 1, ..., m; j = (k + 1), (k + 2), .... n; p =1, 2, ..., k). Since the last m bits serve as the starting state 
and are also fed into the encoder, there is an end-round-shift phenomenon for the last m columns of H. 

Example 2: Consider the previous encoder shown in Example 1, assume that a block of k = 6 information bits are encoded. 
Then the tail-biting construction gives a binary (12, 6) code with generator and parity check matrices, 

11 01 10 11 00 00 



and 



H 
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11 00 00 11 01 10 
10 11 00 00 11 01 

01 10 11 00 00 11 
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B. Degree distribution of Tanner graph for tail-biting convolution codes 

Looking at the H matrix of a tail-biting convolutional code, we can notice that it is similar to the H matrix of an 
irregular LDPC code where the number of non-zero elements is not a fixed number per row and column. Our goal is to 
represent the tail-biting convolutional codes through Tanner graphs in order to decode them using the BP algorithm. 
Therefore, it is important to obtain the degree distribution of the Tanner graph which describes the number of edges into the 
bit and check nodes in irregular LDPC codes. The fraction of edges which are connected to degree-/ bit nodes is denoted X t , 
and the fraction of edges which are connected to degree-/ check nodes is denoted p ( . The functions 



X(x) = X x x + X 2 x 2 + ••• + XiX 1 ' 1 + ■ 

p(x) — piX + p 2 X 2 + 1- PiX l ~ X + 

are defined to describe the degree distributions. 
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III. TURBO CODES 

To replace the traditional decoders of turbo codes by the BP decoder, we have to obtain the parity-check matrix for 
the turbo code as was done in the previous section for the tail-biting convolutional codes. 
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A. Parity check matrix for turbo codes 

Let us consider a recursive systematic convolutional (RSC) code C of rate R = 1/2 . It has two generator 
polynomials g 1 (X) and g 2 (X) of degree v + 1, where v is the memory of the encoder. Let u(X) be the input of the encoder 
and x 1 (X)and x 2 (X) its outputs. We consider this code as a block code obtained from the zero-tail truncation of the RSC. 
Using the parity check matrix H of the RSC, we can do some column permutations and rewrite the H matrix as H new , where 
H n ew — Wi H 2 ]. As mentioned before, we consider a conventional turbo code C, resulting from the parallel concatenation of 
two identical RSC codes C and whose common inputs are separated by an interleaver of length N, represented by matrix M 
of size N x N with exactly one nonzero element per row and column. It is well known that the superior performance of 
turbo codes is primarily due to the interleaver, i.e., due to the cycle structure of the Tanner graph [17]. Hence, the parity 
check matrix of the whole turbo code is (for a detailed proof, the reader is referred to [18]): 
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Example 3: Let us now consider the special case of a RSC code C of rate R — 1/2 whose input u(X) has a finite degree 
N—l (i.e. the input vector has size A?). Its parity-check matrix H can now be written as an A' x2N matrix over GF(2) whose 
coefficients are fixed by its generator. The first (respectively second) N xN part of H consists of shifted rows representing 
the coefficients of g 2 (X) (respectively gi(X)). For example, choosing gi — 101, g 2 — 111, and N = 8, we have: 

Note that the number of non-zero elements per row and 
per column in the "diagonal" sub-matrices H 1 and H 2 is upper bounded by L = v + 1, the constraint length of the constituent 
codes, which is always very small in comparison to the length of the interleaver. Also, the interleaver does not change the 
weights of the sub-matrix H 2 M T . As with the tailbiting convolutional code, the H 
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matrix for a turbo code can also be seen as the H matrix of an irregular LDPC code, since the weight of non-zero 
elements per row and column is not strictly constant, but always very small compared to the size of the parity-check matrix. 
Then, the parity-check matrix for the mentioned turbo code in our example will be as follows: 
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Following (9) and (10) provided in the previous section, the degree distribution of this turbo code is given by 
A(x) - 0.291667% + 0.083333* 2 + 0.25x 3 + 0.16667* 4 + 0.208333* 5 (14) 
p(x) = 0.083333* 2 + 0.125* 3 + 0.033333* 4 + 0.208333* 5 + 0.25x 5 (15) 

IV. WIMAX AND LTE CODING STRUCTURE 

To address the low and high rate requirements of LTE, the 3 rd Generation Partnership Project (3GPP) working 
group undertook a rigorous study of advanced channel coding candidates such as tail-biting convolutional and turbo codes 
for low and high data rates, respectively. We investigate here the application of the BP decoder for the proposed turbo code 
in LTE systems. Meanwhile, a rate V2, memory-6 tail-biting convolutional code has been adopted in the WiMAX (802. 16e) 
system, because of its best minimum distance and the smallest number of minimum weight codewords for larger than 32-bit 
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payloads which is used for both frame control header (FCH) and data channels. In fact, we will focus here on the FCH which 
has much shorter payload sizes (12 and 24 bits) as shown in the next subsection. 

A. Tail-biting convolutional code in 802. 16e 

Here, we briefly describe the WiMAX frame control header structure. In the WiMAX Orthogonal Frequency 
Division Multiplexing (OFDM) physical layer, the payload size of the frame control header is either 24 bits or 12 bits and 
the smallest unit for generic data packet transmission is one subchannel. A subchannel consists of 48 QPSK symbols (96 
coded bits). At a code rate of Vi, one subchannel translates to 48 bits as the smallest information block size. Currently, the 
FCH payload bits are repeated to meet the minimum number (48) of encoder information bits. The generator polynomials for 
the rate Vi WiMAX tail-biting convolutional code are given by g 1 = (1011011) and g 2 = (llllOOl)in binary notation. 
According to [19], these generator polynomials have the best d min = (minimum distance) and n min = 

(number of codewords with weight d min ) 

for payload sizes > 33 bits and for some payload sizes between 25 and 33 bits, under the constraint of memory size m = 6 
and code rate Vi. 



B. Turbo code in LTE system 

The 3GPP turbo code is a systematic parallel concatenated convolutional code (PCCC) with two 8 -state constituent 
encoders and one turbo code internal interleaver. Each constituent encoder is independently terminated by tail bits. For an 
input block size of K bits, the output of a turbo encoder consists of three length K streams, corresponding to the systematic 
and two parity bit streams (referred to as the "Systematic", "Parity 1", and "Parity 2" streams in the following), respectively, 
as well as 12 tail bits due to trellis termination. Thus, the actual mother code rate is slightly lower than 1/3. In LTE, the tail 
bits are multiplexed to the end of the three streams, whose lengths are hence increased to (K + 4) bits each [5]. The transfer 
function of the 8-state constituent code for the PCCC is: 

G(D) = [l,^?l. (16) 
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The initial value of the shift registers of the 8-state con- stituent encoders will be all zeros when starting to encode 
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the input bits. The output from the turbo encoder is d k — x^, d k — zj. , and d k — z k for k = 0, 1, 2, . . . , K-l. If the 
code block to be encoded is the 0-th code block and the number of filler bits is greater than zero, i.e., F > 0, then the encoder 
will set c k = 0, k = 0,. . . , (F-l) at its input and will set d ( k 0) =< NULL > , k = 0,. . . , (F-l) and d£ 0) =< NULL >„k = 0„, 
. , (F-l) at its output [5]. The bits input to the turbo encoder are denoted by c , c 1; c 2 , c 3 , ... , c k _ 1 „ and the bits output from 
the first and second 8-state constituent encoders are denoted by z ,z 1 ,z 2 ,z 3 , ...,z k _ 1 and z ,z 1 ,z 2 ,z 3 , ...,z k _ 1 , respectively. 
The bits output from the turbo code internal interleaver are denoted by C , Cy, ... . , C K _ 1 , and these bits are to be the input to 
the second 8-state constituent encoder. 



V. SIMULATION RESULTS 

Considering the previous example of the tail-biting convolu-tional code in WiMAX systems and binary 
transmission over an AWGN channel, the BP algorithm as in [4] is compared with the maximum-likelihood (ML) Viterbi 
type algorithm to decode the same tail-biting convolutional code [9, 12] To determine by simulation the maximum decoding 
performance capability of each algorithm, at least 300 codeword errors are detected at each SNR value. Figure 1 shows a 
performance comparison between the two mentioned decoding algorithms for a payload size of 24 bits. Note that the 
maximum number of iterations for the BP algorithm is 30 iterations. The imulation results show that the proposed BP 
algorithm exhibits a slight performance penalty with respect to the ML Viterbtype algorithm. However, ince the BP decoder 
is less complex than this traditional decoder and enables a unified decoding approach, this loss in BER performance is 
deemed acceptable 

In addition, a comparison between the same short length code using the BP and ML Viterbi algorithms has been 
performed in Figure 2. In this case, a loss of 1.85 dB or less in FER compared with the traditional decoder is observed. 




Figure 1 . Comparisons of BER for length- 24 rate l A tail-biting convolulional 

code. 
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Figure 2. Comparisons of FER for leugth-24 rate Vi tail-biting convolutional 
code. 

In Figure 3, we report simulation results for the AWGN channel for the LTE turbo code that was studied in the 
previous section. When compared to the traditional MAP and SOVA decoders [11, 12], the BP algorithm is about 1.7 dB 
worse at a BER value of 10~ 2 . Also, as we obtained a general form for the parity-check matrices of tail-biting convolutional 
and turbo codes, then we can enhance the performance by investigating other decoding algorithms which are also applicable 
for LDPC codes. 

For further research, we propose exploring alternatives to the flooding schedule usually adopted for LDPC codes to 
enhance the BER performance. 




Figure 3. Comparisons of BER for leiigth-40 rate 1/3 turbo code. 

VI. COMPLEXITY COMPARISON 

A direct comparison between the complexity of different decoding algorithms is implementation dependent. 
Starting with the traditional decoders for turbo codes, the MAP process computes the log-likelihood for all paths in the 
trellis. The MAP algorithm estimates the metric for both received binary zero and a received binary one, then compares them 
to determine the best overall estimate. The SOVA process only considers two paths of the trellis per step: the best path with 
a data bit of zero and the best path with a data bit of one. In addition, it utilizes the difference of the log-likelihood function 
for each of these paths. However, the SOVA is the least complex of the two algorithms in terms of number of 
calculations [14], Finally, for the BP algorithm, the decoding complexity per iteration grows linearly with the number of 
edges (the number of messages passed per iteration is twice the number of edges in the graph E). Moreover, one can argue 
that the complexity of the operations at the variable and check nodes frequently scales linearly with 



E — y nvii = y nCjj 



(19) 



Following the notations of Luby et al. [13], consider a Tanner graph with n left nodes, where v t = — represents 
the fraction of left nodes of degree i > and d,,(resp. d c ) isthe variable node degree (res. check node degree). Also, C,- = — 
is defined to be the fraction of right nodes of degree j > 1. 

The complexity comparisons of the various decoding algorithms are shown in Table I where k is the number of 
systematic bits and v is the memory order of the encoder. The table gives the operations per iteration for MAP, SOVA, and 
BP decoding for the horizontal (H) and vertical (V) step. Note that, for the BP algorithm, the complexity per information bit 
is [20] 
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Table I 
Decoder Complexity Comparisons 
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Considering example 3, Figure 4 shows a comparison between these mentioned algorithms in terms of the number 
of operations required in the implementations. In comparison with MAP and SOVA decoders, BP exhibits the lowest 
implementation complexity over all the required operations. 
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Figure 4. Comparison of MAP, SOVA, and BP decoders in terms of number 
of operations. 

VII. CONCLUSION 

In this paper, the feasibility of decoding arbitrary tailbiting convolutional and turbo codes using the BP algorithm 
was demonstrated. Using this algorithm to decode the tailbiting convolutional code in WiMAX systems speeds up the error 
correction convergence and reduces the decoding computational complexity with respect to the ML-Viterbi-based algorithm. 
In addition, the BP algorithm performs a non-trellis based forward-only algorithm and has only an initial decoding delay, 
thus avoiding intermediate decoding delays that usually accompany the traditional MAP and SOVA components in LTE 
turbo decoders. However, with respect to the traditional decoders for turbo codes, the BP algorithm is about 1.7 dB worse at 
a BER value of 10-2. This is because the nonzero element distribution in the parity-check matrix is not random enough. 
Also, there are a number of short cycles in the corresponding Tanner graphs. Finally, as an extended work, we propose the 
BP decoder for these codes in a combined architecture which is advantageous over a solution based on two separate decoders 
due to efficient reuse of computational hardware and memory resources for both decoders. In fact, since the traditional turbo 
decoders (based on MAP and SOVA components) have a higher complexity, the observed loss in performance with BP is 
more than compensated by a drastically lower implementation complexity. Moreover, the low decoding complexity of the 
BP decoder brings about endto- end efficiency since both encoding and decoding can be performed with relatively low 
hardware complexity. 
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